Scalable Name Disambiguation using Multi-level Graph Partition
نویسندگان
چکیده
When non-unique values are used as the identifier of entities, due to their homonym, confusion can occur. In particular, when (part of) “names” of entities are used as their identifier, the problem is often referred to as the name disambiguation problem, where goal is to sort out the erroneous entities due to name homonyms (e.g., if only last name is used as the identifier, one cannot distinguish “Vannevar Bush” from “George Bush”). In this paper, in particular, we study the scalability issue of the name disambiguation problem – when (1) a small number of entities with large contents or (2) a large number of entities get un-distinguishable due to homonyms, how to resolve it? We first carefully examine two of the state-of-the-art solutions to the name disambiguation problem, and point out their limitations with respect to scalability. Then, we adapt the multi-level graph partition technique to solve the large-scale name disambiguation problem. Our claim is empirically validated via experimentation – our proposal shows orders of magnitude improvement in terms of performance while maintaining equivalent or reasonable accuracy compared to competing solutions.
منابع مشابه
Optimizing Teleportation Cost in Multi-Partition Distributed Quantum Circuits
There are many obstacles in quantum circuits implementation with large scales, so distributed quantum systems are appropriate solution for these quantum circuits. Therefore, reducing the number of quantum teleportation leads to improve the cost of implementing a quantum circuit. The minimum number of teleportations can be considered as a measure of the efficiency of distributed quantum systems....
متن کاملName Disambiguation from link data in a collaboration graph
In a social community, multiple persons may share the same name, phone number or some other identifying attributes. This, along with other phenomena, such as name abbreviation, name misspelling, and human error leads to erroneous aggregation of records of multiple persons under a single reference. Such mistakes affect the performance of document retrieval, web search, database integration, and ...
متن کاملImprovement of the E ciency of Genetic Algorithms for Scalable Parallel Graph Partitioning in a Multi-Level Framework
Parallel graph partitioning is a di cult issue, because the best sequential graph partitioning methods known to date are based on iterative local optimization algorithms that do not parallelize nor scale well. On the other hand, evolutionary algorithms are highly parallel and scalable, but converge very slowly as problem size increases. This paper presents methods that can be used to reduce pro...
متن کاملImprovement of the Efficiency of Genetic Algorithms for Scalable Parallel Graph Partitioning in a Multi-level Framework
Parallel graph partitioning is a difficult issue, because the best sequential graph partitioning methods known to date are based on iterative local optimization algorithms that do not parallelize nor scale well. On the other hand, evolutionary algorithms are highly parallel and scalable, but converge very slowly as problem size increases. This paper presents methods that can be used to reduce p...
متن کاملName List Only? Target Entity Disambiguation in Short Texts
Target entity disambiguation (TED), the task of identifying target entities of the same domain, has been recognized as a critical step in various important applications. In this paper, we propose a graphbased model called TremenRank to collectively identify target entities in short texts given a name list only. TremenRank propagates trust within the graph, allowing for an arbitrary number of ta...
متن کامل